We present Pre-trained Machine Reader (PMR), a novel method to retrofit Pre-trained Language Models (PLMs) into Machine Reading Comprehension (MRC) models without acquiring labeled data. PMR is capable of resolving the discrepancy between model pre-training and downstream fine-tuning of existing PLMs, and provides a unified solver for tackling various extraction tasks. To achieve this, we construct a large volume of general-purpose and high-quality MRC-style training data with the help of Wikipedia hyperlinks and design a Wiki Anchor Extraction task to guide the MRC-style pre-training process. Although conceptually simple, PMR is particularly effective in solving extraction tasks including Extractive Question Answering and Named Entity Recognition, where it shows tremendous improvements over previous approaches especially under low-resource settings. Moreover, viewing sequence classification task as a special case of extraction task in our MRC formulation, PMR is even capable to extract high-quality rationales to explain the classification process, providing more explainability of the predictions.
translated by 谷歌翻译
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.
translated by 谷歌翻译
We propose an analysis-by-synthesis method for fast multi-view 3D reconstruction of opaque objects with arbitrary materials and illumination. State-of-the-art methods use both neural surface representations and neural rendering. While flexible, neural surface representations are a significant bottleneck in optimization runtime. Instead, we represent surfaces as triangle meshes and build a differentiable rendering pipeline around triangle rasterization and neural shading. The renderer is used in a gradient descent optimization where both a triangle mesh and a neural shader are jointly optimized to reproduce the multi-view images. We evaluate our method on a public 3D reconstruction dataset and show that it can match the reconstruction accuracy of traditional baselines and neural approaches while surpassing them in optimization runtime. Additionally, we investigate the shader and find that it learns an interpretable representation of appearance, enabling applications such as 3D material editing.
translated by 谷歌翻译
Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. To extract sufficient prior information from single sample, the Hankel matrix formulation is employed. Besides, the penalized weighted least-squares and total variation are introduced to achieve superior image quality. Specifically, we first train a score-based generative model on one sinogram by extracting a great number of tensors from the structural-Hankel matrix as the network input to capture prior distribution. Then, at the inference stage, the stochastic differential equation solver and data consistency step are performed iteratively to obtain the sinogram data. Finally, the final image is obtained through the filtered back-projection algorithm. The reconstructed results are approaching to the normal-dose counterparts. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality.
translated by 谷歌翻译
最近的进步表明,深度神经网络(DNN)容易受到对抗性扰动的影响。因此,有必要使用对抗攻击评估高级DNN的鲁棒性。但是,将使用贴纸作为扰动的传统物理攻击比最近的基于光的物理攻击更容易受到伤害。在这项工作中,我们提出了一种基于投影仪的物理攻击,称为“对抗颜色投影(ADVCP)”,该攻击通过操纵投影光的物理参数来进行对抗攻击。实验显示了我们方法在数字和物理环境中的有效性。实验结果表明,所提出的方法具有出色的攻击传递性,它赋予了Advcp有效的BlackBox攻击。我们向ADVCP提出威胁,威胁到未来的基于视觉的系统和应用程序,并提出一些基于轻型物理攻击的想法。
translated by 谷歌翻译
随着实际图表的扩大,将部署具有数十亿个参数的较大GNN模型。此类模型中的高参数计数使图表的训练和推断昂贵且具有挑战性。为了降低GNN的计算和记忆成本,通常采用了输入图中的冗余节点和边缘等优化方法。但是,直接针对模型层稀疏的模型压缩,主要限于用于图像分类和对象检测等任务的传统深神网络(DNN)。在本文中,我们利用两种最先进的模型压缩方法(1)训练和修剪以及(2)稀疏训练GNN中的重量层。我们评估并比较了两种方法的效率,从精确性,训练稀疏性和现实世界图上的训练拖失lop方面。我们的实验结果表明,在IA-Email,Wiki-Talk和Stackoverflow数据集上,用于链接预测,稀疏训练和较低的训练拖失板可以使用火车和修剪方法达到可比的精度。在用于节点分类的大脑数据集上,稀疏训练使用较低的数字插槽(小于1/7的火车和修剪方法),并在极端模型的稀疏性下保留了更好的精度性能。
translated by 谷歌翻译
众所周知,深神经网络(DNN)的性能容易受到微妙的干扰。到目前为止,基于摄像机的身体对抗攻击还没有引起太多关注,但这是物理攻击的空缺。在本文中,我们提出了一种简单有效的基于相机的物理攻击,称为“对抗彩色膜”(ADVCF),该攻击操纵了彩色膜的物理参数以执行攻击。精心设计的实验显示了所提出的方法在数字和物理环境中的有效性。此外,实验结果表明,ADVCF生成的对抗样本在攻击转移性方面具有出色的性能,这可以使ADVCF有效的黑盒攻击。同时,我们通过对抗训练给予对ADVCF的防御指导。最后,我们调查了ADVCF对基于视觉的系统的威胁,并为基于摄像机的物理攻击提出了一些有希望的心态。
translated by 谷歌翻译
深度神经网络(DNN)已被广泛用于计算机视觉任务,例如图像分类,对象检测和分割。尽管最近的研究表明它们易受输入图像中手动数字扰动或失真的脆弱性。网络的准确性受到培训数据集的数据分布的极大影响。缩放原始图像会创建分布数据,这使其成为欺骗网络的对抗性攻击。在这项工作中,我们通过通过不同的倍数将ImageNet挑战数据集的子集缩放出一个子集,从而提出了一个缩放分数数据集Imagenet-C。我们工作的目的是研究缩放图像对高级DNN的性能的影响。我们对所提出的Imagenet-CS进行了几个最新的深神网络体系结构进行实验,结果显示缩放大小和准确性下降之间存在显着的正相关。此外,根据RESNET50体系结构,我们展示了一些关于最近提出的强大训练技术和策略(例如Augmix,Revisiting and Ranstorize of Al Of Awmiting and Normorizer of Un Imagenet-cs)的测试。实验结果表明,这些强大的训练技术可以改善网络对缩放转换的鲁棒性。
translated by 谷歌翻译
变压器被认为是自2018年以来最重要的深度学习模型之一,部分原因是它建立了最先进的记录(SOTA)记录,并有可能取代现有的深神经网络(DNNS)。尽管取得了显着的胜利,但变压器模型的延长周转时间是公认的障碍。序列长度的多样性施加了其他计算开销,其中需要将输入零填充到批处理中的最大句子长度,以容纳并行计算平台。本文针对现场可编程的门阵列(FPGA),并提出了一个连贯的序列长度自适应算法 - 硬件与变压器加速度的共同设计。特别是,我们开发了一个适合硬件的稀疏注意操作员和长度意识的硬件资源调度算法。提出的稀疏注意操作员将基于注意力的模型的复杂性降低到线性复杂性,并减轻片外记忆流量。提出的长度感知资源硬件调度算法动态分配了硬件资源以填充管道插槽并消除了NLP任务的气泡。实验表明,与CPU和GPU实施相比,我们的设计准确度损失很小,并且具有80.2 $ \ times $和2.6 $ \ times $速度,并且比先进的GPU加速器高4 $ \ times $ $ $ \ times $通过Cublas Gemm优化。
translated by 谷歌翻译
以前的无监督域适应性(UDA)方法旨在通过从富含标签的源域到未标记的目标域的单向知识转移来促进目标学习,而到目前为止,尚未共同考虑其从目标到源的反向适应性。实际上,在一些真正的教学实践中,老师帮助学生学习,同时在某种程度上也从学生那里获得晋升,这激发了我们探索域之间的双向知识转移,因此提出了双重校正适应网络(DUALCAN)在本文中。但是,由于跨域的不对称标签知识,从未标记的目标转移到标记的来源比共同的源与目标对应物更加困难。首先,由源预测的目标伪标签通常涉及模型偏差引起的噪音,因此在反向适应中,它们可能会损害源绩效并带来负目标转换。其次,源域通常包含先天噪声,这将不可避免地加剧目标噪声,从而导致跨域的噪声扩增。为此,我们进一步引入了噪声识别和校正(NIC)模块,以纠正和回收两个域中的噪声。据我们所知,这是对嘈杂UDA的双向适应的首次幼稚尝试,并且自然适用于无噪声UDA。给出理论理由以说明我们的直觉的合理性。经验结果证实了双can的有效性,其性能在最先进的方面具有显着的性能,尤其是对于极端嘈杂的任务(例如,PW-> pr和PR-> RW的办公室房屋)的有效性。
translated by 谷歌翻译